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Abstract 

It is widely recognized that the cleaving rate of a restriction enzyme 
on target DNA sequences is several orders of magnitude faster than the 
maximal one calculated from the diffusion-limited theory. It was there- 
fore commonly assumed that the target site interaction of a restriction 
enzyme with DNA has to occur via two steps: one-dimensional diffu- 
sion along a DNA segment, and long-range jumps coming from associ- 
ation/dissociation events. We propose here a stochastic model for this 
reaction which comprises a series of ID diffusions of a restriction en- 
zyme on non-specific DNA sequences interrupted by 3D excursions in 
the solution until the target sequence is reached. This model provides 
an optimal finding strategy which explains the fast association rate. 
Modeling the excursions by uncorrelated random jumps, we recover the 
expression of the mean time required for target site association to occur 
given by Berg & al. ( Berg, et al., 1981] ), and we explicitly give several 
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physical quantities describing the stochastic pathway of the enzyme. 
For competitive target sites we calculate two quantities: processivity 
and preference. By comparing these theoretical expressions to recent 
experimental data obtained for EcoRV-DNA interaction, we quantify: 
i) the mean residence time per binding event of EcoIW on DNA for a 
representative ID diffusion coefficient, ii) the average lengths of DNA 
scanned during the ID diffusion (during one binding event and during 
the overall process), iii) the mean time and the mean number of visits 
needed to go from one target site to the other. Further, we evaluate 
the dynamics of DNA cleavage with regard to the probability for the 
restriction enzyme to perform another ID diffusion on the same DNA 
substrate following a 3D excursion. 
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Introduction 

Genetic events often depend on the interaction of a restriction enzyme 
with a target DNA sequence. Indeed, the restriction enzyme has first to find 
this sequence on DNA. This mechanism has long remained mysterious. The 
simplest model considers this mechanism as a reaction between two point- 
like entities, the restriction enzyme and its target DNA sequence, in a solute 
volume. However, kinetic measurements of reactivity show that the reaction 
occurs at an extraordinarily rapid rate, far above the three-dimensional dif- 
fusion limit rate! dRichter, et al., 1974 Riggs, et al., 1970 ). To account for 



this, it was proposed that the reaction occurs via a "facilitated" diffusion pro- 
cess ( Von Hippel, et al., 1989| ). The restriction enzyme first binds to DNA 



on a non-specific site, then performs a one-dimensional random walk until it 
reaches the target DNA sequence. Indeed, it is by scanning the DNA and not 
by diffusing in a 3D volume that the restriction enzyme reaches its target 



site sequence. However, results from experiments JSzczelkun, et al., 1 996) 
using two interlinked rings of DNA (plasmid, each containing a target site 
for the restriction enzyme EcoRV) rule out this possibility: the mechanism 
of target site localization does not involve a unique ID diffusion along DNA. 
If it were the case, the EcoIW enzyme would cleave the DNA of only one of 
the two rings, as opposed to what is observed. Moreover, it is expected that 
molecular crowding of in-vivo situations must hinder any long ID scanning 



process of the DNA (Wenner, et al., 1999). 



To account for the fast association rate, several strategies have been pro- 



posed and modeled from experimental data (Berg, et al., 1981 Von Hippel, et al., 1989 



| Winter, et al., 198ip . Four major translocation processes were identified (we 



recall that translocation is the overall process by which a protein goes from 
one DNA sequence to another). The first, the "sliding" process, corresponds 
to the pure one-dimensional diffusion as discussed above. The second, the " 



intersegmental transfer" (Milsom, et al., 2001), involves dimer proteins hav- 



ing two binding sites. The restriction enzyme bound on DNA at the first site 
binds its second site to a remote DNA sequence and then dissociates from 
the first one. The two other translocation processes are induced by several 
dissociation-reassociation events. According to the rebinding of the enzyme 
either near the departure site or to an uncorrelated site, the translocation 



process is called "hopping" or "jumping" ( |Halford, et al., 2002| ). Which of 
these translocation processes or which combination of them describes the 
mechanism of target site localization on DNA is still an open question. 

Understanding the translocation process is of great importance as it gov- 



erns the kinetics of genetic events (Misteli, 2001). Several experimental in- 



vestigations were carried out in order to elucidate the pathway followed by a 
restriction enzyme to reach a single target site. Some of them quantify the 
rate of cleavage reactions, by varying the length of the DNA strand (for a re- 
view, see ( Shimamoto, 1999) )) or the salt concentration ( Winter, et al., 1981 



Lohman, 1986D which affects the binding properties of DNA-affine proteins 



on non-specific sequences. These experimental results allows one to reject 
the possibility of a unique translocation process, but can not fully describe 



the structure of the combined process. Berg & al. ( Berg, et al., 1981] ) had 
proposed a theoretical approach to quantify the relevant parameters of the 
localization of a single target site. Their model describes the overall search- 
ing process comprising the primarily encounter of the enzyme with a DNA 



domain and the secondary encounter of the enzyme with the target site. Here 
we deal with the unvisited case of two competitive target sites in order to 
quantitatively analyze the physical proprieties of the second encounter, i.e. 
the target site localization of a restriction enzyme initially bounded to the 
DNA. Only the study of such systems gives access to the detailed pathway 
of secondary encounter with well defined initial conditions. Related exper- 
imental studies with two differentiable target sites located at well-defined 



positions on the DNA strand (Langowski, et al., 1983 Terry, et al., 1985 



|Stanford, et al., 2000 ) allows one to handle two descriptive quantities: the 



preference and the processivity of the restriction enzymes. The preference is 
the ratio of the number of enzymes which react with one target site, over the 
number of enzymes which react with the other target site. The processivity 
is the fraction of enzymes which will react successively with the two target 
sites. To extract from these experiments physical parameters of the enzyme 
pathway such as the proportion of time spent by the enzyme on the DNA, 
the average number of dissociation/association events and the average DNA 
length scanned prior to the target site localization, it is necessary to build a 
reliable physical model that can mimic the biological situation. 

Here, we propose a simple and general stochastic model to describe the 
kinetics of target site localization of a restriction enzyme on DNA, which 
explicitly combines any ID motion along the DNA and 3D excursions in the 
solution. In the particular case of ID diffusing motion, our model allows one 
to recover the analytic expression for the mean time needed for the enzyme 



to find a single target site on DNA given by Berg & al. (Berg, et al., 1981). 



This mean time presents an optimum, corresponding to the quickest find- 



ing strategy which can be discussed in the cases of point-like and extended 
target sites. The model explicitly gives the mean number of enzyme visits 
on the DNA and the proportion of the DNA visited until the target site is 
localized. For two target sites, our model provides theoretical expressions for 
the preference and the processivity factors. These expressions involve two 
unknown physical parameters: the ID and 3D residence frequencies A and 
A'. We show that A is easily evaluated from the confrontation of the theoret- 
ical preference to experimental data. The second unknown parameter A', of 
minor physical relevance, is extracted from the assumption that the searching 
strategy is optimal which will be justified. The comparison of the theoretical 
processivity factor to experimental data allows us to predict the value of a 
dynamic-associated parameter: the probability that after an excursion the 
enzyme will associate to the same DNA substrate it has left, n r . 

The article is constructed as follows : first we give the general background 
of such an approach and we present the hypothesis of our model. Then we 
deduce the mean search time from the study of the density of the first time 
passage, and for the cases of point-like and extended target sites we discuss 
the optimal strategy in order to find the most quickly as possible the target 
site. We give the condition of existence of this optimal strategy as well 
as its quantitative characteristics. We discuss the value of the optimal ID 
frequency and evaluate finite-size effects. Equation IT2l gives the mean target 
site localization time for an enzyme which starts from a random position on 
the DNA. The complete distribution of the number of visits of the protein 
on the DNA is explicity determined. In particular, its mean value is given by 
Eq. HH1 The average number of distinct bp visited on the DNA is given by 
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Eq. [2TJ Second, the preference and the processivity factors of the restriction 
enzyme for two target sites, as functions of the distance between the target 
sites, are obtained (Eq. EH and Eq. l39j) and compared with experimental 
results concerning EcoEV JStanford, et al., 2000) . The comparison gives us 



the residence time on the DNA per binding event and other related physical 
quantities. We then numerically obtain the mean time needed for the enzyme 
to go from the first target site to the second target site (using Eq. E>T|) . 
and the mean number of visits on the DNA substrate before the two target 
sites are cleaved. In conclusion, we discuss the predicted value of n r defined 
previously. 

Model 

We present our model in the framework of a generic protein searching 
for its target site on the DNA. The case of dimer proteins which can bind 
simultaneously to two target sites is not investigated in order to discard inter- 
segmental transfers. As a first approximation, the "hopping" translocation 
process is assumed to be represented effectively in the ID diffusion of the 
protein. Then, the pathway followed by the protein, considered as a point- 
like particle, is a succession of ID diffusions along the DNA strand and 3D 
excursions in the surrounding solution (Fig.l). The time spent by the protein 
on a DNA strand during each binding event is assumed to follow an expo- 
nential law with dissociation frequency A. This law relies on a Markovian 
description of the chemical bond which is commonly used. The probability 
for the protein to be still bound to DNA at a random time t (knowing that 
it is bound at t = 0) is then P(T > t) = exp(— At), and the probability that 



7 



the protein leaves the DNA at a random time T in the interval [t, t + dt] is 
P(t < T < t + dt) = A exp(-At)dt. 

The one-dimensional motion on DNA can be modeled from a contin- 
uous Brownian motion with diffusion coefficient D. As it is usually done 
(see for instance ( Jeltsch, et al., 19 98;)) , we assume that the extremities of 



the DNA chain act on the protein as reflecting boundaries. Thus, a protein 
when reaching an extremity during a binding event is reflected and continues 
its one-dimensional motion. The target site sequence is a specific sequence 
of base-pairs (e.g. the restriction enzyme EcoTW, recognizes the sequence 



GATATC ( |Taylor, et al., 1989| ). The reaction occurs when the reactive do- 
main of the protein matches the target site sequence. To a first approxi- 
mation, we model the target site sequence as being a perfect reactive point 
(Fig. 2). The reaction is assumed to be infinitely fast as soon as the protein 
meets the target site. Note that in this case the protein can find the target 
site only by diffusing along DNA. The precise mechanism of this elementary 
act is still subject to discussion. In particular, the profile of the DNA-protein 
interaction potential is unknown, and could be attractive over an extended 
area. It is then reasonable also to treat the case where the target site is a 
zone of finite extension 2r (Fig. 3). In that case the target site can then be 
reached either by diffusion along the DNA, or by coming directly from a 3D 
excursion. This second approach, developed further, gives rise to strongly 
different behavior of the search time. 

As a first approximation, the excursions are assumed to be uncorrelated in 
space. Hence, when dissociating from DNA, a protein will rebind at a random 
position. In other words, the probability to reach a site on DNA after an 
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excursion is uniformly distributed along the whole DNA molecule. It has been 
suggested dWinter, et al., 1981 ) that for not excessively concentrated long 
molecules in solution the DNA strands form disjoint domains diluted in the 
medium. A protein which reaches such a DNA domain will be trapped in it. 
In this case excursions might be correlated due to the geometric configuration 
of the DNA. As the configuration of a polymer strand in solution is a random 
coil, even short three dimensional excursions can lead to a long effective 
translocation of the linear position of the protein on DNA. Consequently, a 
small number of long range transitions is sufficient to uncorrelate the protein 
position on DNA. 

We now introduce three basic quantities used in this work. The first one, 
^3d(*), is the probability density that the protein in the solution at time 
t = will bind DNA at time t at a random position: 



P 3D (t) = A' exp(-A't) (1) 

where the distribution of the time spent during an excursion is assumed 
to follow an exponential law with frequency A' corresponding to a mean time 
spent in the surrounding solution r' = 1/A'. Accounting rigorously for the 
entire law is beyond the scope of this work. Rather we concentrate here on 
the characteristic time r', which exists and is finite as soon as the system 
is confined; and the exponential tail of the law, which proves to be valid in 
most plausible geometries. We will show that this model captures the main 
relevant characteristics of the problem. 
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The second quantity, Pm(t\x), is the conditional probability density that 
the protein, being on the DNA at position x and at time t = 0, will dissociate 
at time t without any encounter with the target site. Assuming that the 
dissociation rate is independent of the state of the protein, one has: 

P 1D (t\x) = Xesxp(-Xt)Q{t\x) (2) 

where Q(t\x) is the conditional probability density that the protein, start- 
ing from the position x, does not meet the target site during its one dimen- 
sional diffusion. Introducing j{t\x) as the probability density of the first 
passage to the target site position at time t without dissociation, one gets 
Q{t\x) = 1 - f*j{t'\x)df. 

The last quantity, PiD(t\x), is the conditional probability density that the 
protein, being on DNA at position x and at time t — 0, will find the target 
site for the first time at time t during its one dimensional diffusion, without 
leaving the DNA: 

P 1D (t\x) = exp(-\t)j(t\x). (3) 

Given these quantities, the first passage density of the protein to the 
target site can be calculated, first in the case of one target site, and then we 
will extend it for two target sites. 

First passage density 
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By calculating the first passage density, we obtain the mean time needed 
for the protein to find its specific target site, as well as all associated moments. 
We assume that the protein starts at t = linked to the DNA at position x. 
We consider a generic event (Fig. 2) whose bulk number of excursions is n — 1, 
the residence times on DNA t±, . . . , t n and the excursion times Ti, . . . , r n _i. 
The probability density of such an event, for which the protein finds the 
target site for the first time t time t = Y17=i ^ + Y17=i r « i s: 



P n (t\x) = PlD(tn)P3D(Tn-l)PlD(t n ) • • • PlD^P^T^Pm^x) 



(4) 



where Pmit) and Pio(t) are averaged over the initial position of the 
protein: P 1D (t) = (P lD (t\x)) x and P 1D (t) = (P 1D (t\x)) x . We denote by M 
the DNA length on the "left" side of the target site and by L the length on 
the "right" side of the target site. The average of a function / over the initial 
position x is given by (f(t\x)) x = J_ M f(t\x)dx. 

To obtain the density of first passage at the target site, F(t\x), we sum 
over all possible numbers of excursions and we integrate over all intervals of 
time, ensuring that t = X^^ + X^ 1 r «- The average over the initial position 
of the protein, F(t) = (F(t\x)) x , can be expressed as: 



n=i 



n-l 



f w = Yl / dti ■ ■ ■ dtndTi ■ ■ ■ dTn - id [J2 li + n - 



n-l 



\PMn) 



i=l 



i=l i=l 
'n-l 

i=l 



PlD\tn) 



(5) 
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Taking the Laplace transform of F(t), F(s) = f™dte st F(t), we obtain: 



( 



j(X + s\x) 



) 



1 



l-(j(\ + s\x)^ 



(6) 



x 



(1 + s/X) (1 + s/\>) 



j(s\x) being the Laplace transform of j(t\x). This expression completely 
solves our problem for any ID motion. We will see in the next section that 
the main quantities of physical interest can be extracted from this formula. 

Optimal search strategy 

The relevant quantity to describe the protein/DNA association reaction 
is the mean time (fi) necessary for the protein to find the target site (see 
above). This mean time is obtained from the derivative of the first passage 
density by the following relation: 



This expression is very general and holds for any ID motion. Now, we cal- 
culate this quantity for a free ID diffusion. The one-dimensional Laplace 





which combined with Eq. El gives: 
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transform of the first passage probability density is well known (see the text- 
books flRedner, 200l| )): 



if x > 0, j(X\x) = cosh 



-x 



tanh 



-L sinh 



(9) 



j + tanh I y ] smn ( v/ ^ x 



Averaging over x, we finally obtain 



1 D 



M + LV A 



j +tanh ( W -^M 



(11) 



where -D is the one dimensional diffusion coefficient. Then the mean 
search time takes the following form: 



<A*> 




A(L + M) 



tanh ( J-^L 



tanh ( \I^M 



(12) 



Some comments about this expression are in order. First, we recover in a 



simple and direct way the original result of Berg et al. (B erg, et al., 19 81) 



obtained from a complete description of the 3D motion ( Berg, et al., 1976 
Berg, et al ., 19771 [BergT et al., 1978| ). 



Second, this quantity is minimum when the target site is centered (as 
expected for symmetry reasons). 
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Third, as soon as the length of the DNA strand is large enough (more 
precisely as soon as sJ^L > 1 or sJ^M ^> 1), (/i) grows linearly with the 
length of the DNA strand. This mirrors the efficiency of the ID and 3D 
combined motion when compared to the quadratic growth obtained in the 
case of pure sliding. In particular, the boundary effects are negligible for this 
quantity as soon as the overall length is large enough. 

Last, this expression is valid for a very large class of 3D motions. More 
precisely, it holds as soon as the mean first return time t^d corresponding to 
the 3D motion is finite and independent of the departure and arrival points. 
The corresponding expression of the mean first passage time is obtained by 
replacing A' by l/r 3D . 

We now come to an important question, already present in the seminal 



work of Berg et al. JBerg, et al., 1981D and recently addressed by Slutsky & 



al JSlutsky, et al., 2004 ), which concerns the optimum strategy for such a 
coupled motion. Indeed, it seems reasonable that (fi) is large for both A 
very large (in the A infinite limit, the protein is never on the DNA), and 
A very small (pure sliding limit). It has been suggested from qualitative 
arguments ( |Slutsky, et al., 2004| ) that the mean search time is minimum when 
the protein spends equal times bound to the DNA and freely diffusing in the 
bulk. 

Here, we more precisely address this question of minimizing the mean 
search time with respect to the ID frequency A. This is the only specially 
"adjustable" (depending strongly on the structure of the protein) parameter: 
A' depends on the properties of the environment and will not vary significantly 
from one protein to another. The ID diffusion coefficient D is a specific 
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quantity, and optimizing the search time with respect to this parameter is 
trivial: D should be as large as possible (note that D and A are assumed to 
be independent). 

The sign of the derivative at A = of the mean search time gives the 
following criterion for having a minimum 



L 2 + M 2 -LM 
L 4 + M 4 + ALM(L 2 + M 2 ) - 9M 2 L 2 ^ > 



In fact, it can be shown that this sufficient condition is also necessary. If 
this condition is fulfilled, a careful analysis of the implicit equation satisfied 
by the frequency at the minimum leads to the following expansion for large 

£ = L + M 



A = A'-4 — - ^^ + — . 14 



Equations [T31 and HU refine the result of Slutsky, which however holds 
true in the large £ limit, or more precisely for y^£ ^> 1. For intermediate 
values of £ boundary effects become important and the minimum can be 
significantly different. 

The (fj) value at the minimum is particularly interesting. We compare it 
to the case of pure sliding where (fi s ) = £ 2 /(3D): 



W _ 6 [D (15) 



(ft) <V A 

The efficiency of the 3D mediated strategy is therefore much more important 
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when the DNA chain is long. For example, using the A and D values obtained 
in the section of the results and for a DNA substrate of length 10 6 bp, the 
mean target site localization time when pure sliding is one thousand fold 
greater than that predicted by our model. 

Further quantitative features of reactive pathways 

In this paragraph, we compute two quantities which characterize more 
precisely the nature of the reactive paths. These quantities are of special in- 
terest as they could be experimentally measured using single-molecule tech- 
niques. 

The first quantity is the distribution p(N) of the number of visits on DNA 
required before reaching the target site. We recall that in the initial state 
the protein is bounded to the DNA, therefore N > 1. The distribution can 
be obtained by slightly modifying the expression of the first passage density 
EqEJ 

POO 

p(N) = / dt(P N (t\x)) x 
Jo 






~n-l 




"n-l 


X 


_i=i 




n p ^) 

_i=i 



Finally, this distribution happens to be a geometric law with parameter 
(j(A|x))^: 

p(N) = (3(Hx)) x (l " (liMx^y' 1 (17) 
This demonstrates that the mean number of visits before reaching the target 
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site is: 



(N) 



1 



\fh(L + M) 



(18) 



j{\\x) 



) 



x 



tanh 



( 




y;L + tanh 




) 



The following form holds: 




(19) 



Note that the large N limit is transparent ((/i) is a succession of approxi- 
mately N ID excursions of average duration 1/A and N 3D excursions of 
average duration 1/X'x). 

The second interesting quantity is the average number of distinct base 
pairs visited before the protein reaches its target site. In our continuous 
description, this corresponds to the average span (S) of the ID motion. For 
sake of simplicity, the target is here assumed to be centered on the DNA 
strand of half length L. The average span can be expressed as the integral 
over the position x on the DNA of the probability that x has been visited 
before reaction. One then obtains: 



where Fq(x, t) is the first passage density at x with adsorbing conditions 
at x = 0, whose Laplace transform will be explicitly computed in the next 
section in the context of competitive targets. Anticipating formula Eql27| 




(20) 
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the span finally reads: 

f L \ cosh ( x fI(L-x)) sinh ( x f±(L + x/2)) ) 

{s) = 2 dx \ 1 + \ rr\ Tfx \ (21) 

Apparently, this integral form can not be substantially simplified, but its 
overall behaviour, and in particular the A dependence, is easily cleared up. 
The span appears to grow monotonously from |L at A = to L for A — > oo. 
This monotonicity, as opposed to the existence of a minimum for the mean 
search time, is a striking feature of this quantity, plotted in figure 6. 

Extended target site 

As mentioned above, the model of a point-like target site disregards the 
possibility of the protein reaching the target site directly from a 3D excursion. 
For this reason, we have to study the case where the target site is an area of 
extension r. We will now show that this new feature significantly changes the 
behaviour of the searching time. The reaction is still assumed to be infinitely 
fast; it occurs either when the protein reaches the boundary of the reaction 
area during a sliding round, or when the protein comes on the reaction area 
directly after a 3D excursion. Following the scheme already developed to 
derive the density of the first passage time (@), one obtains: 



_ -l 

2r 



F{s)=l(j\X + s\x)) +-^- 1- + \ , '* S .(22) 



L + MJ I (1 + s/X) (1 + s/X') 

where (f)^ = * M (J_^ fdx + J r L fdx). The average search time then reads 
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(we only give the case L = M for sake of simplicity): 



1 1\ (*-rW£-tanh((i-rW£ 



(Mr)) = ' A + X 77 77 (23) 



For I large enough, the minimum is obtained for 



(\'r+ V\' 2 r 2 + D\> 



Amin — (24) 



It is remarkable that the scaling \ min ~ A' holds true only for A' <C D/r 2 . 
For larger frequencies A', we have A m j„ ~ 4\' 2 r 2 /D. The value of the search 
time at the minimum (ju(r)) miTl is modified. For r small we get: 



2f 2fr 

W)) mi n = ^= " + 0{r 2 ) (25) 



whereas for larger r the expansion reads: 

f DP 

MOU = jr r ~ ^ + ^(Vr 5 ) (26) 

We now consider the case of two target sites in order to compare the 
model to experimental results. 

Case of two competitive target sites 

The biological system ( |Stanford, et al., 2000 1 consists in integrating two 



target sites for the restriction enzyme EcoKV on a 690 bp linear DNA sub- 
strate. The position along a DNA strand of the first target site, which will be 
called target 1, is fixed and equals 120 bp. The second target site, which will 
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be called target 2, has been placed at 54 bp, 200 bp, and 387 bp from the first 
target site. Thus, three substrates (Fig. 5) were used to analyze the kinetics 
of DNA cleavage. Each assay was carried out at a very low concentration of 
enzyme with regard to the concentration of DNA . For higher concentration 
of enzyme, the probability for two -or more- molecules acting on a same 
DNA strand would be not negligible. The cleavage of DNA produces differ- 
ent lengths of DNA. An enzyme can cut target 1, target 2, or both, resulting 
in 5 lengths of fragments. The authors observed the initial formation of four 
of these: A, BC, C and AB types. 

The advantage of this construction is that the first cleavage process gives 
a starting point to elucidate how EcoBM will cleave the second target site. In 
contrast, when using constructions with one target site, the primary pathway 
of the enzyme to reach the DNA domain can dominate the kinetics of the 
search process. For example, in highly diluted DNA solutions, the DNA 
domains are separated by long distances and then the mean time spent by the 
enzyme in reaching a DNA domain will contribute in a nonegligible manner to 
the total mean time needed to find the target site. Moreover, our theoretical 
model supposes that the enzyme starts on the DNA and therefore does not 
comprise the primary encounter. This assumption agrees with the case of 
experimental substrates with two target sites . 

Conditional search time density 

In order to get a better understanding of this process we first study an- 
alytically the distribution of the search time t of one target, for instance 2, 
knowing that no reaction occurred at target 1. We denote by Fj(2,t) this 



20 



conditional search time density averaged over the initial condition. We make 
use of the general method developed in the first section to derive this quan- 
tity. Indeed, this problem involves a combination of 3D excursions and ID 
motions, its peculiarity being that the ID motion is a constrained diffusion, 
as reaction with target 1 is excluded. It suffices then to rewrite formula EqlHl 
as follows: 



*M = < A (A + .|2,«)) - Jl \ ( 1 + . A )(l + ,/A0 "J 

The first factor ^jj(s|2,x)^) is the Laplace transform of the first passage 
density at 2 avoiding 1 for a standard ID diffusion, and corresponds to the 
last excursion before finding the target 2. In turn, the term proportional 
to ^1 — (^ji(A + s\2,x)^ — (^(A + s|l,x)^J /s is the Laplace transform of 
the survival probability density, and comes from the succession of non reac- 
tive excursions on DNA. Theses quantities are obtained by standard meth- 
ods, considering successively the initial condition on fragment A (with mixed 
boundary conditions), B (with absorbing boundary conditions), and C (with 
mixed boundary conditions). This finally yields to 
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and 



1 ID I , / IX \ cosh (\/^ 



where a, 6, c denote the length of fragments A, B, C respectively. This set 
of equations fully describes the problem, and will be used in next section to 
analyze experimental data. In particular the mean conditional search time 
could be deduced straightforwardly from Eq. [23 its explicit form is not given 
here for sake of simplicity. 

Preference and processivity 

In order to get quantitative measurements of the pathway of the enzyme, 
the authors of ( Stanford, et al., 2000D introduced two concepts: preference 



and processivity. The value of the preference P quantifies the preferential 
use of the target 2 by EcoIW. The P value is experimentally obtained by 
taking the ratio of the initial formation rate uab of AB substrates (resulting 
from cleavage at the target site 2), over the initial formation rate vbc of BC 
substrates (resulting from cleavage at the target site 1). 



P = ^ (30) 



The processivity quantifies the fraction of the cleaved DNA that is cleaved 
first at one target site then cleaved at the second target site during the en- 
counter of the DNA substrate with an enzyme. The processivity of the restric- 
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tion enzyme on the target 2 to the target 1, can be deduced from experimental 
data by introducing the processivity factor: f p2 i = [yc — ^ab)/(^c + vab)- 
One can define in the same manner a symmetric quantity, which is the 
processivity factor of the reaction with the target 1 and then target 2: 
/pi2 = {y a — vbc)/{va + vbc) and then the total processivity factor which 
represent the fraction of both processive actions: 



va + v c + vab + v BC 



The next sections deal with these two quantities obtained from our model 
by considering the enzyme-to-target (s) association rate, namely z/i, u 2 , f 2 t, 
and U12 which are defined by the following elementary reactions, instead of 
substrate rate production: 

DNA — ► A + BC with rate v x 

DNA — > AB + C with rate v> 2 

(32) 

DNA — ► A + BC — > A + B + C with rate v 2X 
DNA — ► AB + C — > A + B + C with rate v 12 

We assume that a restriction enzyme hits a DNA molecule at site x with 
homogeneous probability per unit time ndx/(L + M). The enzyme con- 
centration is chosen sufficiently small so that multiple encounter events are 
negligible. Consequently, a fragment BC (or AB) can be cut into B and C 
(or A and B) only if the enzyme which cleaves the DNA molecule to give BC 
(or AB) remains on this fragment (the probability of this event, depending 
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in detail on the chemical mechanism, will be denoted Pi n u) and then finds 
the site 2 (or 1). The reaction rates is then: 

U! = k[ dt' [ T ^-F^l,x,t-t') = K (F^l,x,s = 0)) (33) 

J-oo JDNA L + M X ' x 

and 

vn = mmt [ dt'F(l,2,t-t') [ dt"! dxF T (2, x, t' - 1") 

J-oo J-oo Jdna (34) 

= K PinU F(l,2,s = 0)(F 1 (2,x,s = 0)^ 

where the quantity F g (y,x,t) is the first passage density at point y at time 
t starting from x and avoiding z. This quantity is accessible analytically 
using Eqj2Z| The quantity F(y, x, t) is the first passage density at point y at 
time t starting from x. The two other rates z/ 2 and v 2 \ are straightforwardly 
obtained by permutation of symbols 1 and 2. One is now able to derive the 
processivity and preference factors. 

Results 

We recall that the lengths of fragments A, B and C are denoted by the 
lower-case letters a, b and c respectively. First, we evaluate the ID frequency 
A from the comparison of the theoretical preference to experimental data. 
Then, using the value of A' which satisfies the optimal searching time (this 
assumption is justified below), we deduce several quantities related to the 
enzyme pathway which links the first target site to the second one. Last, by 
comparing the analytical expression of the processivity factor to experimental 
data, we introduce a dynamic-associated parameter: the probability that 
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after an excursion the enzyme will associate to the same DNA substrate it 
has left, 7r r . 
Preference 

The preference for the target site 1 over site 2 is given by 



P = — = = — = -) (-£ (35) 

El V BC Vl - U21 X; s = o) 



where v x = dx/dt is the rate for forming the specie x, which can be measured 
experimentally. Explicitly: 



tanh(W^c) + (cosh( A /^6) - 1)/ sinh(W 

P = ^= (36) 

tanh(W^a) + (cosh( J ±b) - l)/sinh( Jj^b) 



This form which expresses the preference as function of b, and reveals in 
particular that the preferred target site is the closest to the middle of the 
molecule, well fits the experimental data (Fig. 7) and allows one to determine 
the only free parameter ^\/ D. The best fit is obtained for: y/X/ D = 
8.7. 10~ 2 bpr 1 . For a representative fast one-dimensional diffusion coefficient 



D = 5.10 5 bp 2 /s JErskine, et al., 1997D, the ID frequency is A = 37.5 s -1 . 



Then the average time spent by the restriction enzyme on DNA per visit 



equals 0.027 s and the average distance scanned per visit (y/lGD/nX) is 260 
bp. Using Eq. EU we obtain a representative average number of distinct sites 
visited on the DNA during the searching process: < S >~ 320bp. 
Enzyme pathway 

A further analysis requires to know the value of the parameter A', which 
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depends strongly on experimental conditions, such as DNA concentration. It 
could be obtained experimentally as the protein/DNA association rate, and 
we here choose a typical value corresponding to the optimal search strategy, 
i.e. A = A'. This assumption is supported by the fact that the target site lo- 
calization is several order of magnitude faster than the diffusion limit. Using 
the same calculation as from Eq. El to Eq. HU without averaging on the initial 
position of the enzyme, we obtain the mean time needed by the restriction 
enzyme to go from the target 1 to the target 2: 




WH. ^|| WiiliL_,| (37) 

tanh(W^6) + tanh(W ^c) 



Then the average search time of the target 2 for a reactive pathway of 
an enzyme starting from the target 1, with inter-site space of 54 bp, is by 
using the formula [H2 (/u) ~ 0.016s. The average number of DNA visits 
before the processive cleaving is, using the formula EH N ~ 1.3. . The same 
quantities for the other inter-target site distances, namely 200 bp and 387 
bp, are respectively: (//) ~ 0.072s, N ~ 2.4; and (//) ~ 0.10s, N ~ 2.9. 

Processivity 

Using the previous results, the processivity factor takes the following 
form: 

f P = — t — = PimtHl, 2, s = 0) (38) 

V\ + v 2 

Here we have to refine the derivation of F(l,2, s = 0), i.e. the probabil- 
ity to ever reach 1 starting from 2. The crucial point is about the dilution 
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approximation, hence we treat the case of one single enzyme. We take into 
account the fact that during each 3D excursion the protein can escape, there- 
fore being definitely lost. We introduce by n r the probability of return after 
a 3D excursion. Rigorously this quantity depends on physical parameters 
such as the DNA length and the typical size of its attractive domain. As the 
lengths of DNA substrates are constant in the experiments of Stanford et al. 
(Stanford, et al., 2000D for which b + c = 570bp, we consider a constant n r . 



We finally obtain: 



7r r (?(A|x)) (l-j(A|2,l)/ 
f P = Pimt | i(A|2, 1) + — i / -^ TZ , I (39) 

1 - 7T r + 7T r {j{\\x)j 



Where (j{\\x)j is given by the Eq. [TT]with L = c and M — b, and where 
j(A|2, 1) is the Laplace transform of the first passage density at 2, starting 
from 1 which is given by Eq. HHwith x = M = b: 



3(\\2,l) = cosh(^b) (40) 



Using the value of A obtained previously, there are 2 unknown parameters: 
Pinit and ir r . They can be determined from the experimental data (Fig. 8); 
the best fit is obtained for p init = 0.5 and n r = 0.85. However, these values 
can be not very accurate as it is used to be the case when estimating two 
parameters by fitting experimental data with theoretical results. 

We will discuss some possible hypotheses arising from the two last fitted 
parameters in the following conclusion. 
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Conclusion 

So far, experimental investigations have allowed one to discriminate be- 
tween two translocation processes, pure sliding or pure jumping. To obtain 
quantitative measurements for such a compound translocation process, it is 



necessary to build a physical reliable model, as Berg & al. flBerg, et al., 1981 1 
did for a single target site. The model presented here permits us to obtain 
numerous quantities determining the pathway followed by a restriction en- 
zyme in finding one target site or two competitive target sites on DNA, by a 
series of ID diffusion periods (sliding) followed by 3D excursions (jumping). 
The corresponding mean search time shows that such a two-step process is 
faster than pure sliding or pure 3D diffusion. The existence and the opti- 
mization of such a search time is discussed. The length dependence of the 
optimum was obtained. 

Using the preference data from assays on EcoKV JStanford, et al., 2000D , 
we quantify the parameter characterizing the pathway of EcoIW, namely 
the ID residence frequency A. Other quantities were extracted from this 
parameter: the mean distance scanned by the restriction enzyme during one 
binding event (260 bp), the distribution of the number of visits on DNA 
prior to cleaving the target site and the average number of distinct DNA 
sites visited. It should be noticed that the small value of the mean distance 
scanned might be due to the assumption of a perfect reactive target site 
which leads to an over-estimated A. In fact, an imperfect reactive target site 
would decrease the preference. Using the data on processivity for EcoIW, we 
introduce two secondary parameters characterizing the detailed pathways of 
the restriction enzyme after DNA cleavage. These parameters come into play 

28 



when more than one target site is present on the DNA. The first parameter 
is the probability for the enzyme to stay (after cleavage with a target site) 
on the DNA strand which harbors the second target site. It was assumed 
that this probability equals 1/2 as the DNA sequences which border the 
target site are almost symmetric. Our best fit suggest that the probability 
is fairly 0.5, justifying the common assumption. The second parameter 7i> is 
the probability for the enzyme to rebind on the cleaved DNA strand it had 
left during an excursion. Because of the short length of DNA substrates, it 
is assumed that the enzyme is "lost" after the dissociation from the DNA. 
This means that the enzyme rebinds unvisited DNA substrates after each 
3D excursion. Therefore, this probability had been previously assumed to be 
negligible. Our model reveals that this probability is high (0.85) which shows 
that the enzyme frequently rebinds to the same DNA substrate. The high 
value of ii r may be explained by the fact that the fragment length £ (which 
is here b + c = 570bp) is significantly larger than the persistence length 
(150 bp). The configuration of the DNA is therefore close to a globule, 
in which the protein can be trapped and hence escape with a rather low 
probability. However, n r may be overestimated because of our assumption of 
neglecting the correlations between the starting and finishing points of the 
3D excursions. Indeed, these correlations would result (for small values of 
the inter-target distance b) in increasing the processivity factor, and therefore 
lowering n r . Note that an imperfect reaction would lower the processivity, 
as in this case the enzyme can pass trough the target site without react, 
therefore increasing the probability of a definitive departure from the DNA 
strand. 
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The present model classifies the stochastic pathway followed by a re- 
striction enzyme searching for its target site, by quantifying the dynamical 
parameters. Our work is in the framework of stochastic dynamics which dic- 
tates the biological processes occurring in the highly structured and crowded 
medium of in-vivo systems. Moreover, this model can be helpful for generic 
situations where a protein has to find a target site on a DNA substrate, e.g. 
the numerous transcription factors needed to trigger the gene activation. 

We are grateful to M. Barbi, G. Oshanin, and J.M. Victor (LPTL) for useful 
discussions. We are also grateful to J. Coppey and M. Jardat for specific comments 
on the manuscript. The numerous pertinent comments, criticisms and suggestions 
given by one referee were deeply appreciated. 
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Figure legend 

Figure 1. A representative path of the restriction enzyme which reaches 
the target site. Excursions in the solution are represented by dashed lines, one- 
dimensional diffusion by continuous lines. The filled square is the target site. 

Figure 2. Representative view of the model. Here the the protein executes 
three excursions before finding the target site. 

Figure 3. Extended target site. 

Figure 4. Schematic representation of the three substrates of length 690 bp. 
The position of the second target site relative to the first target equals 54 bp, 200 
bp and 387 bp, respectively. 

Figure 5. The mean search time plotted against the one-dimensional resi- 
dence frequency A. The length of DNA is 5000 bp, the three-dimensional residence 
frequency is 10s _1 and the ID diffusion coefficient is 5.10 5 bp 2 /s. 

Figure 6. The average number of distinct DNA sites visited by the enzyme 
against the one-dimensional residence frequency A. The half-length of DNA is 
lOObp which allows one to also read this number as a percentage. 

Figure 7. The preference of the protein for the target site 2 over the target site 



1. The solid line represents the fitted solution which gives a/a7d = 8.7.10 _2 bp _1 . 
The two dashed lines correspond to the limit cases when there is no sliding (straight 
line, A = oo) and when there is only sliding (upper line, A = 0). The other 
parameters were drawn from experimental data [I = G90bp). 

Figure 8. The processive action of the restriction enzyme. Dashed lines 
represent two fitted solutions of the model of Stanford ( Stanford, et al., 2000| ) with 



pure sliding. The two solid lines represent the solutions of our model for yX/D = 
S.T.lQ^bp- 1 and 

Pinit — 0.5: one for tt t — 0, and the other one which passes near 
experimental points for ir r = 0.85. 
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Figure 7: 
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